Pesquisa | Portal Regional da BVS

A pathway-based computational framework for identification of a new modal of multi-omics biomarkers and its application in esophageal cancer.

Zhou, Qi; Ye, Weicai; Yu, Xiaolan; Bao, Yun-Juan.

Comput Methods Programs Biomed ; 247: 108077, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38382307

RESUMO

BACKGROUND: The pathway-based strategy has been recently proposed for identifying biomarkers with the advantages of higher biological interpretability and cross-data robustness than the conventional gene-based strategy. However, its utility in clinical applications has been limited due to the high computational complexity and ill-defined performance. OBJECTIVE: The current study presents a machine learning-based computational framework using multi-omics data for identifying a new modal of biomarkers, called pathway-derived core biomarkers, which have the advantages of both gene-based and pathway-based biomarkers. METHODS: Machine-learning methods and gene-pathway network were integrated to select the pathway-derived core biomarkers. Multiple machine-learning algorithms were used to construct and validate the diagnostic models of the biomarkers based on more than 1400 multi-omics clinical samples of esophageal squamous cell carcinoma (ESCC). RESULTS: The results showed that the classifier models based on the new modal biomarkers achieved superior performance in the training datasets with an average AUC/accuracy of 0.98/0.95 and 0.89/0.81 for mRNAs and miRNA, respectively, higher than the currently known classifier models based on the conventional gene-based strategy and pathway-based strategy. In the testing cohorts, the AUC/accuracy increased by 6.1 %/7.3 % than the models based on the native gene-based biomarkers. The improved performance was further confirmed in independent validation cohorts. Specifically, the sensitivity/specificity increased by â¼3 % and the variance significantly decreased by â¼69 % compared with that of the native gene-based biomarkers. Importantly, the pathway-derived core biomarkers also recovered 45 % more previously reported biomarkers than the gene-based biomarkers and are more functionally relevant to the ESCC etiology (involved in 14 versus 7 pathways related with ESCC or other cancer), highlighting the cross-data robustness of this new modal of biomarkers via enhanced functional relevance. CONCLUSIONS: The results demonstrated that the new modal of biomarkers not only have improved predicting performance and robustness, but also exhibit higher functional interpretability thus leading to the potential application in cancer diagnosis.

Tri©DB: an integrated platform of knowledgebase and reporting system for cancer precision medicine.

Jiang, Wei; Wang, Peng-Ying; Zhou, Qi; Lin, Qiu-Tong; Yao, Yao; Huang, Xun; Tan, Xiaoming; Yang, Shihui; Ye, Weicai; Yang, Yuedong; Bao, Yun-Juan.

J Transl Med ; 21(1): 885, 2023 Dec 06.

Artigo em Inglês | MEDLINE | ID: mdl-38057859

RESUMO

BACKGROUND: With the development of cancer precision medicine, a huge amount of high-dimensional cancer information has rapidly accumulated regarding gene alterations, diseases, therapeutic interventions and various annotations. The information is highly fragmented across multiple different sources, making it highly challenging to effectively utilize and exchange the information. Therefore, it is essential to create a resource platform containing well-aggregated, carefully mined, and easily accessible data for effective knowledge sharing. METHODS: In this study, we have developed "Consensus Cancer Core" (Tri©DB), a new integrative cancer precision medicine knowledgebase and reporting system by mining and harmonizing multifaceted cancer data sources, and presenting them in a centralized platform with enhanced functionalities for accessibility, annotation and analysis. RESULTS: The knowledgebase provides the currently most comprehensive information on cancer precision medicine covering more than 40 annotation entities, many of which are novel and have never been explored previously. Tri©DB offers several unique features: (i) harmonizing the cancer-related information from more than 30 data sources into one integrative platform for easy access; (ii) utilizing a variety of data analysis and graphical tools for enhanced user interaction with the high-dimensional data; (iii) containing a newly developed reporting system for automated annotation and therapy matching for external patient genomic data. Benchmark test indicated that Tri©DB is able to annotate 46% more treatments than two officially recognized resources, oncoKB and MCG. Tri©DB was further shown to have achieved 94.9% concordance with administered treatments in a real clinical trial. CONCLUSIONS: The novel features and rich functionalities of the new platform will facilitate full access to cancer precision medicine data in one single platform and accommodate the needs of a broad range of researchers not only in translational medicine, but also in basic biomedical research. We believe that it will help to promote knowledge sharing in cancer precision medicine. Tri©DB is freely available at www.biomeddb.org , and is hosted on a cutting-edge technology architecture supporting all major browsers and mobile handsets.

Assuntos

Neoplasias , Medicina de Precisão , Humanos , Medicina de Precisão/métodos , Genômica/métodos , Neoplasias/genética , Neoplasias/terapia , Bases de Conhecimento

SweepCluster: A SNP clustering tool for detecting gene-specific sweeps in prokaryotes.

Qiu, Junhui; Zhou, Qi; Ye, Weicai; Chen, Qianjun; Bao, Yun-Juan.

BMC Bioinformatics ; 23(1): 19, 2022 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-34991447

RESUMO

BACKGROUND: The gene-specific sweep is a selection process where an advantageous mutation along with the nearby neutral sites in a gene region increases the frequency in the population. It has been demonstrated to play important roles in ecological differentiation or phenotypic divergence in microbial populations. Therefore, identifying gene-specific sweeps in microorganisms will not only provide insights into the evolutionary mechanisms, but also unravel potential genetic markers associated with biological phenotypes. However, current methods were mainly developed for detecting selective sweeps in eukaryotic data of sparse genotypes and are not readily applicable to prokaryotic data. Furthermore, some challenges have not been sufficiently addressed by the methods, such as the low spatial resolution of sweep regions and lack of consideration of the spatial distribution of mutations. RESULTS: We proposed a novel gene-centric and spatial-aware approach for identifying gene-specific sweeps in prokaryotes and implemented it in a python tool SweepCluster. Our method searches for gene regions with a high level of spatial clustering of pre-selected polymorphisms in genotype datasets assuming a null distribution model of neutral selection. The pre-selection of polymorphisms is based on their genetic signatures, such as elevated population subdivision, excessive linkage disequilibrium, or significant phenotype association. Performance evaluation using simulation data showed that the sensitivity and specificity of the clustering algorithm in SweepCluster is above 90%. The application of SweepCluster in two real datasets from the bacteria Streptococcus pyogenes and Streptococcus suis showed that the impact of pre-selection was dramatic and significantly reduced the uninformative signals. We validated our method using the genotype data from Vibrio cyclitrophicus, the only available dataset of gene-specific sweeps in bacteria, and obtained a concordance rate of 78%. We noted that the concordance rate could be underestimated due to distinct reference genomes and clustering strategies. The application to the human genotype datasets showed that SweepCluster is also applicable to eukaryotic data and is able to recover 80% of a catalog of known sweep regions. CONCLUSION: SweepCluster is applicable to a broad category of datasets. It will be valuable for detecting gene-specific sweeps in diverse genotypic data and provide novel insights on adaptive evolution.

Assuntos

Polimorfismo Genético , Seleção Genética , Análise por Conglomerados , Genética Populacional , Genótipo , Humanos , Desequilíbrio de Ligação , Modelos Genéticos

H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs.

Ye, Weicai; Chen, Ying; Zhang, Yongdong; Xu, Yuesheng.

Bioinformatics ; 33(8): 1130-1138, 2017 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-28087515

RESUMO

Motivation: The sequence alignment is a fundamental problem in bioinformatics. BLAST is a routinely used tool for this purpose with over 118 000 citations in the past two decades. As the size of bio-sequence databases grows exponentially, the computational speed of alignment softwares must be improved. Results: We develop the heterogeneous BLAST (H-BLAST), a fast parallel search tool for a heterogeneous computer that couples CPUs and GPUs, to accelerate BLASTX and BLASTP-basic tools of NCBI-BLAST. H-BLAST employs a locally decoupled seed-extension algorithm for better performance on GPUs, and offers a performance tuning mechanism for better efficiency among various CPUs and GPUs combinations. H-BLAST produces identical alignment results as NCBI-BLAST and its computational speed is much faster than that of NCBI-BLAST. Speedups achieved by H-BLAST over sequential NCBI-BLASTP (resp. NCBI-BLASTX) range mostly from 4 to 10 (resp. 5 to 7.2). With 2 CPU threads and 2 GPUs, H-BLAST can be faster than 16-threaded NCBI-BLASTX. Furthermore, H-BLAST is 1.5-4 times faster than GPU-BLAST. Availability and Implementation: https://github.com/Yeyke/H-BLAST.git. Contact: yux06@syr.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Gráficos por Computador , Proteínas/química , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Aminoácidos , Computadores , Bases de Dados de Ácidos Nucleicos , Software , Fatores de Tempo

High speed BLASTN: an accelerated MegaBLAST search tool.

Chen, Ying; Ye, Weicai; Zhang, Yongdong; Xu, Yuesheng.

Nucleic Acids Res ; 43(16): 7762-8, 2015 Sep 18.

Artigo em Inglês | MEDLINE | ID: mdl-26250111

RESUMO

Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST--the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.

Assuntos

Alinhamento de Sequência/métodos , Software , Algoritmos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Genoma Humano , Humanos

Chemical constituents with antibacterial activity from Euphorbia sororia.

Zhang, Wei-Ku; Xu, Jie-Kun; Zhang, Xiao-Qi; Yao, Xin-Sheng; Ye, Wei-Cai.

Nat Prod Res ; 22(4): 353-9, 2008 Mar 10.

Artigo em Inglês | MEDLINE | ID: mdl-18322851

RESUMO

A group of ceramide (1) was isolated from the aerial parts of Euphorbia sororia. On the basis of spectroscopic data, chemical methods and GC-MS analysis, the structure of 1 was characterised as (2S,3S,4R,8E)-2-(eicosanoyl approximately octacosanoyl amino)-1,3,4-octadecanetriol-8-ene. In addition, four known ellagic acid derivatives 3,3'-di-O-methylellagic acid (2), 3,3',4'-tri-O-methylellagic acid (3), 4-O-sulfooxy-3,3'-di-O-methylellagic acid (4) and 4-O-sulfooxy- 3,3',4'-tri-O-methylellagic acid (5) were isolated from the plant. Biological screening of all compounds revealed moderate antibacterial activity.

Assuntos

Antibacterianos/química , Antibacterianos/farmacologia , Ceramidas/química , Ceramidas/farmacologia , Euphorbia/química , Bactérias/efeitos dos fármacos , Estrutura Molecular

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA